Skip to content

Enable A365 tracing and fix W3C baggage propagation in agentserver#46754

Merged
ankitbko merged 59 commits into
mainfrom
feature/enable-a365-tracing
May 19, 2026
Merged

Enable A365 tracing and fix W3C baggage propagation in agentserver#46754
ankitbko merged 59 commits into
mainfrom
feature/enable-a365-tracing

Conversation

@singankit
Copy link
Copy Markdown
Contributor

Summary

Enable Agent365 (A365) tracing in the agentserver packages and fix W3C baggage propagation so that incoming baggage entries (e.g. user.id) are visible to span processors on all spans.

Changes

A365 Tracing Enablement (agentserver-core)

  • Gate A365 export behind FOUNDRY_AGENT365_TRACING_ENABLED env var
  • Add agent identity resolvers (agent_id, blueprint_id, tenant_id) from env vars
  • Enable a365_enable_observability_exporter and a365_observability_scope_override
  • Wire through _FoundryEnrichmentSpanProcessor with span attributes

Streaming Context Fix (responses)

  • Capture full OTel context (span + baggage) at wrap time in _wrap_streaming_response
  • Re-attach during async iteration so baggage is available after the handler's finally block

W3C Baggage Propagation Fix (responses + invocations)

  • Use W3CBaggagePropagator().extract() to extract only baggage from incoming headers
  • Merge extracted baggage onto get_current() before adding server entries
  • This preserves span parent-child relationships while capturing incoming baggage like user.id

Tests

  • 3 baggage propagation tests for responses package
  • 3 baggage propagation tests for invocations package
  • Coverage: baggage merging, span parenting preserved, empty header safety

Root Cause

start_as_current_span(context=extracted_ctx) inside request_span makes the span current, but baggage from the extracted context does not survive through the contextmanager yield boundary to get_current() as seen by the endpoint handler. Only entries explicitly added after get_current() survive. The fix extracts baggage separately at the endpoint handler level.

singankit and others added 7 commits May 4, 2026 13:27
Conditionally enable A365 observability export via microsoft-opentelemetry
distro when both FOUNDRY_HOSTING_ENVIRONMENT and
FOUNDRY_AGENT365_TRACING_ENABLED env vars are set. Uses S2S endpoint
for token resolution in hosted environments.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…hment

- Add resolve_agent_id() with FOUNDRY_AGENT_INSTANCE_CLIENT_ID env var
  (falls back to name:version or name)
- Add resolve_agent_blueprint_id() with FOUNDRY_AGENT_BLUEPRINT_CLIENT_ID
- Add resolve_agent_tenant_id() with FOUNDRY_AGENT_TENANT_ID
- Wire all three through _FoundryEnrichmentSpanProcessor
- Make processor __init__ keyword-only

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ator

The streaming async generator runs after the request handler's finally
block detaches baggage. Fix by capturing the full OTel context (including
baggage) at wrap time and re-attaching it during iteration, so child spans
created during streaming can see baggage entries like conversation_id.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extract incoming baggage (e.g. user.id) using W3CBaggagePropagator
without re-extracting traceparent, preserving parent-child span
relationships while making caller's baggage entries visible to
downstream span processors.

Also removes stale flask/sqlalchemy imports from prior attempts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…kages

- Apply same baggage extraction fix to invocations/_invocation.py
- Add 3 baggage propagation tests for invocations package
- Add 3 baggage propagation tests for responses package
- Tests verify: baggage merging, span parenting preserved, empty header safety

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@singankit singankit marked this pull request as ready for review May 6, 2026 14:51
@singankit singankit requested a review from ankitbko as a code owner May 6, 2026 14:51
Copilot AI review requested due to automatic review settings May 6, 2026 14:51
@github-actions github-actions Bot added the Hosted Agents sdk/agentserver/* label May 6, 2026
singankit and others added 16 commits May 6, 2026 09:50
Server-added entries (response_id) are set after span starts, so
on_start processor won't see them. Test should only verify incoming
baggage merging.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ponse'

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…an start

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Thread enable_sensitive_data kwarg from AgentServerHost through
configure_observability -> _configure_tracing -> _setup_distro_export
-> use_microsoft_opentelemetry so Agent Framework SDK records prompts,
tool arguments, and results.

Defaults to True; set FOUNDRY_ENABLE_SENSITIVE_DATA=false to opt out.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add _ATTR_FOUNDRY_AGENT_TYPE constant
- Set agent_type='hosted' when FOUNDRY_HOSTING_ENVIRONMENT is set
- Only write attribute on spans with gen_ai.operation.name == invoke_agent
- Add 3 tests for agent_type scoping behavior

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace request_span() with request_context() that extracts and attaches
incoming W3C trace context (traceparent/tracestate/baggage) without creating
a span. Framework spans created inside handlers are now parented directly
under the caller's span.

Changes:
- core/_tracing.py: Add request_context(), remove request_span()
- core/_base.py: Simplify AgentServerHost.request_context() wrapper
- invocations/_invocation.py: Remove span creation/attrs/end logic
- responses/_endpoint_handler.py: Same simplification
- Remove agent_type from enrichment processor (no invoke_agent span)
- Update all tests to validate context propagation without server span

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces the weak status-code-only assertion with a test that creates a
span inside the handler and verifies trace ID and parent span ID match
the incoming traceparent header.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The request_context method was added in 2.0.0b4 (as part of the
invoke_agent span removal). Update invocations and responses packages
to require the correct minimum version.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Revert min dependency back to >=2.0.0b3 and add hasattr guards
so that invocations/responses gracefully degrade when running
against core 2.0.0b3 (which lacks request_context). This fixes
the mindependency CI check.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Creates a real OTel caller span, injects its trace context into
the request headers, creates a child span in the invocation handler,
and validates the handler span is correctly parented under the caller.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…dd baggage tests

- Add invocation_id baggage-to-span-attribute mapping in _FoundryEnrichmentSpanProcessor.on_start
- Add core tests for invocation_id enrichment (from baggage, no baggage, child propagation)
- Add invocations test verifying SDK-set baggage (invocation_id, session_id) available in handler
- Add responses test verifying SDK-set baggage (response_id, conversation_id, streaming) available in handler
- Add invocations integration test verifying baggage entries stamped as span attributes via enricher

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tate

In CI environments where microsoft-opentelemetry distro is installed and
APPLICATIONINSIGHTS_CONNECTION_STRING is set, non-tracing tests would
trigger use_microsoft_opentelemetry() on the first server construction,
installing a global TracerProvider that breaks traceparent-propagation
tests.

Fix:
- Add session-scoped _prevent_distro_setup fixture in both invocations
  and responses conftest.py that mocks _setup_distro_export for all tests
- Pass configure_observability=None in conftest factory functions
- Pass configure_observability=None in test_tracing_disabled_by_default
  and test_no_tracing_when_no_endpoints

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace synthetic traceparent string with real OTel span + inject()
pattern. This ensures correct trace context propagation regardless of
which TracerProvider or auto-instrumentation (e.g. microsoft-opentelemetry)
is active in the CI environment.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
singankit and others added 12 commits May 15, 2026 20:13
Temporarily remove BaggageMiddleware from the middleware stack to test
whether its context.attach() call is causing the NonRecordingSpan crash
in azure-ai-projects _responses_instrumentor.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the Starlette OTel instrumentor (which created noisy SERVER and
ASGI internal spans) and replace with a lightweight TraceContextMiddleware
that only extracts W3C traceparent/tracestate/baggage from incoming
requests. This ensures downstream spans (from MAF/agent-framework) are
children of the caller's trace without creating extra spans.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TraceContextTextMapPropagator was not importable from
opentelemetry.trace.propagation. Use the global propagate.extract()
instead which handles both TraceContext and Baggage propagation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
No longer needed since we replaced StarletteInstrumentor with our own
lightweight TraceContextMiddleware. Fixes CI 'Analyze dependencies'
failure (dependency not in shared_requirements.txt).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The _prevent_distro_setup fixture was blocking the Azure Monitor exporter
for ALL tests including E2E ones. Now checks the marker expression and
yields without patching when tracing_e2e tests are selected.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TraceContextMiddleware now propagates W3C trace context automatically at
the middleware layer, so handlers no longer need to call request_context().
The method was removed from AgentServerHost in this branch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
NonRecordingSpan does not expose a public .context attribute in all
OpenTelemetry versions. Use get_span_context() which is the stable API
that works for all span types.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Verifies that when an incoming request has NO traceparent/tracestate/baggage
headers (e.g. health checks, direct calls), spans created by downstream
frameworks like MAF are still properly exported to App Insights as new traces.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Instead of matching by span name, capture the exact span_id and trace_id
from the created span and query App Insights by those IDs for precise
correlation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This test is a duplicate of test_span_parenting_in_appinsights and
test_span_emitted_without_incoming_trace_context. It fails intermittently
because it runs first and App Insights ingestion delay is longer for the
initial telemetry session. The same validation is covered by the later
tests which pass reliably.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Same App Insights ingestion timing issue as core — first test to run
suffers from cold-start delay > 300s. The span parenting test already
validates span export AND parent-child relationships (superset).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@singankit singankit force-pushed the feature/enable-a365-tracing branch from 7a54fd0 to 52a9cce Compare May 17, 2026 07:54
App Insights has a cold-start ingestion delay for the first telemetry
session sent to a resource — data can take 5+ minutes to become
queryable via KQL. This caused the first E2E test to always fail in CI.

Fix: Add a session-scoped autouse fixture that sends a dummy span and
polls until App Insights confirms ingestion (up to 360s). Real tests
then run against a 'warm' pipeline with fast ingestion.

Also restores test_handler_span_in_appinsights which is no longer flaky.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@singankit singankit force-pushed the feature/enable-a365-tracing branch from 52a9cce to b7852ca Compare May 17, 2026 22:39
singankit and others added 5 commits May 17, 2026 15:43
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- azure-ai-agentserver-core: 2.0.0b4 -> 2.0.0b5
- azure-ai-agentserver-invocations: 1.0.0b4 -> 1.0.0b5
- azure-ai-agentserver-responses: 1.0.0b6 -> 1.0.0b7

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread sdk/agentserver/azure-ai-agentserver-core/CHANGELOG.md
singankit and others added 2 commits May 18, 2026 10:10
Merged new release notes into existing version entries (b4/b4/b6)
instead of creating separate b5/b5/b7 entries.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
yulin-li and others added 2 commits May 19, 2026 10:25
…6973)

Aligns the invocations_ws transport with the new tracing model on
feature/enable-a365-tracing: trace context propagation is handled by
the core TraceContextMiddleware, and user-created spans inside handlers
are correctly parented without a framework-generated root span.

- _invocation_ws.py: remove the per-connection 'websocket_session' span
  (request_span call, otel_span, _safe_set_attrs helper, end_span import,
  span_ctx finalize block). Drop the dead 'handler_exc' kwarg on
  _finalize_session and the 'error_message' kwarg on _emit_close_event
  (handler exception text never reaches the close-event log line, by
  design).
- _constants.py: drop unused ATTR_SPAN_ERROR_MESSAGE; relabel the section
  comment from 'Span attribute keys' to 'Structured-log extra keys'.
- CHANGELOG.md: trim the WS telemetry feature bullet to describe the
  log-only path; no breaking-change entry (the WS span was added and
  removed within the same unreleased 1.0.0b4).
- tests/test_ws_tracing.py: removed (17 span-assertion tests obsolete;
  close-event log line still covered by tests/test_ws_close_event.py).

186 tests pass, 2 skipped.

Co-authored-by: Yulin Li <yulili@microsoft.com>
@ankitbko ankitbko merged commit 74fb2fe into main May 19, 2026
34 checks passed
@ankitbko ankitbko deleted the feature/enable-a365-tracing branch May 19, 2026 04:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Hosted Agents sdk/agentserver/*

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants